Adapting to Learner Errors with Minimal Supervision
نویسندگان
چکیده
This article considers the problem of correcting errors made by English as a Second Language writers from a machine learning perspective, and addresses an important issue of developing an appropriate training paradigm for the task, one that accounts for error patterns of non-native writers using minimal supervision. Existing training approaches present a trade-off between large amounts of cheap data offered by the native-trained models and additional knowledge of learner error patterns provided by the more expensive method of training on annotated learner data. We propose a novel training approach that draws on the strengths offered by the two standard training paradigms—of training either on native or on annotated learner data—and that outperforms both of these standard methods. Using the key observation that parameters relating to error regularities exhibited by non-native writers are relatively simple, we develop models that can incorporate knowledge about error regularities based on a small annotated sample but that are otherwise trained on native English data. The key contribution of this article is the introduction and analysis of two methods for adapting the learned models to error patterns of non-native writers; one method that applies to generative classifiers and a second that applies to discriminative classifiers. Both methods demonstrated state-of-the-art performance in several text correction competitions. In particular,
منابع مشابه
Autonomously adapting range data patterns for object detection
We present a novel approach to recognizing patterns in laser range data that performs on a par with the state of the art while at the same requiring minimal parameters and supervision. Most importantly, supervision is only needed at the level of real-world objects that a robot can interact with (humans, in our experiments). This is an important step towards autonomously cognitive systems, since...
متن کاملIdentifying the Factors Affecting the Incidence of Medication Errors of Nurses in Teaching Hospitals of Shiraz University of Medical Sciences
Background: Medication errors are one of the major causes of injury to patients while receiving medical care. This study aimed to investigate the effective causes of medication errors in nurses in educational hospitals affiliated with Shiraz University of Medical Sciences. Methods: This descriptive-analytical and cross-sectional study was conducted in 2020 on 340 nurses from 10 educational ...
متن کاملImproving distant supervision using inference learning
Distant supervision is a widely applied approach to automatic training of relation extraction systems and has the advantage that it can generate large amounts of labelled data with minimal effort. However, this data may contain errors and consequently systems trained using distant supervision tend not to perform as well as those based on manually labelled data. This work proposes a novel method...
متن کاملAutomatic Linguistic Annotation of Large Scale L2 Databases: The EF-Cambridge Open Language Database (EFCamDat)
∗Naturalistic learner productions are an important empirical resource for SLA research. Some pioneering works have produced valuable second language (L2) resources supporting SLA research.1 One common limitation of these resources is the absence of individual longitudinal data for numerous speakers with different backgrounds across the proficiency spectrum, which is vital for understanding the ...
متن کاملDetecting and Correcting Learner Korean Particle Omission Errors
We detect errors in Korean post-positional particle usage, focusing on optimizing omission detection, as omissions are the single-biggest factor in particle errors for learners of Korean. We also develop a system for predicting the correct choice of a particle. For omission detection, we model the task largely on English grammatical error detection, but employ Korean-specific features and filte...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computational Linguistics
دوره 43 شماره
صفحات -
تاریخ انتشار 2017